ASURA: Scalable and Uniform Data Distribution Algorithm for Storage Clusters
نویسنده
چکیده
In algorithm management, data are distributed in accordance with a data distribution algorithm that is capable of determining, on the basis of the datum ID, the node in which the required data is being stored. Among the requirements for a data distribution algorithm are short calculation times, low memory consumption, uniform data distribution in accordance with the capacity of each node and the ability to handle the addition or removal of nodes. This paper presents a data distribution algorithm called ASURA (Advanced Scalable and Uniform storage by Random number Algorithm), which satisfies these requirements. It offers roughly 0.6-μs calculation time, kilobyte-order memory consumption, less than 1% maximum variability between nodes in data distribution, data distribution in accordance with the capacity of each node and optimal data movement to maintain data distribution in accordance with node capacity when nodes are added or removed. ASURA is contrasted in this paper qualitatively and quantitatively with representative data distribution algorithms: Consistent Hashing and Straw Buckets in CRUSH. The comparison results show that ASURA can achieve the same storage cluster capacity as Consistent Hashing with dozens fewer nodes by virtue of the uniformity of its distribution with the same level calculation time. They also show that the execution time of ASURA is shorter than that of Straw Buckets in CRUSH. The results reveal that ASURA is the best algorithm for large-scale storage cluster systems. rowth in size of data managed by computers is leading to increased storage system capacity. Recent capacities cannot be achieved with one storage node or a few storage nodes. Thus, storage cluster technologies and distributed storage technologies, which manage many storage nodes as one storage system, are urgently required. Each node in a storage system knows all the nodes in the storage cluster and each node in a storage system knows some of the nodes in a distributed storage system (peer-to-peer (P2P) system). Storage clusters are the focus of this paper. Because applications need to know data-node correspondences for the purpose of access, all combinations of data identifiers (IDs) and data-storing nodes must be managed. When data need to be accessed, the node storing them is determined on the basis of the relevant datum ID. There are three such types of combination management: table management, algorithm management and a mixture of both. In table management, combinations of data IDs and data-storing nodes are memorized in a management table. When a datum …
منابع مشابه
Efficient, balanced data placement algorithm in scalable storage clusters
Data distribution and load balancing become increasingly important in large-scale distributed storage system. This paper focuses on the problem of designing an optimal, self-adaptive strategies for balanced distribution and reorganization of replicated objects among a dynamically heterogeneous nodes, and presents a novel decentralized algorithm, Dynamic Interval Mapping, which maps replicated o...
متن کاملRDIM: A Self-adaptive and Balanced Distribution for Replicated Data in Scalable Storage Clusters
As storage systems scale from a few storage nodes to hundreds or thousands, data distribution and load balancing become increasingly important. We present a novel decentralized algorithm, RDIM (Replication Under Dynamic Interval Mapping), which maps replicated objects to a scalable collection of storage nodes. RDIM distributes objects to nodes evenly, redistributing as few objects as possible w...
متن کاملخوشهبندی دادهها بر پایه شناسایی کلید
Clustering has been one of the main building blocks in the fields of machine learning and computer vision. Given a pair-wise distance measure, it is challenging to find a proper way to identify a subset of representative exemplars and its associated cluster structures. Recent trend on big data analysis poses a more demanding requirement on new clustering algorithm to be both scalable and accura...
متن کاملRUSH: Balanced, Decentralized Distribution for Replicated Data in Scalable Storage Clusters
Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a decentralized algorithm, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or ...
متن کاملEnergy optimization based on routing protocols in wireless sensor network
Considering the great significant role that routing protocols play in transfer rate and choosing the optimum path for exchange of data packages, and further in the amount of consumed energy in the routing protocol, the present study has focused on developing an efficient compound energy algorithm based on cluster structure which is called active node with cluster structure. The purpose of this ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1309.7720 شماره
صفحات -
تاریخ انتشار 2013